AITopics | variance-adaptive linear bandit

Collaborating Authors

variance-adaptive linear bandit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs

Neural Information Processing SystemsDec-23-2025, 17:33:33 GMT

horizon-free linear mixture mdp, improved regret analysis, variance-adaptive linear bandit, (5 more...)

Neural Information Processing Systems

Industry: Education (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs

Neural Information Processing SystemsOct-9-2024, 11:54:09 GMT

In online learning problems, exploiting low variance plays an important role in obtaining tight performance guarantees yet is challenging because variances are often not known a priori. Recently, considerable progress has been made by Zhang et al. (2021) where they obtain a variance-adaptive regret bound for linear bandits without knowledge of the variances and a horizon-free regret bound for linear mixture Markov decision processes (MDPs). In this paper, we present novel analyses that improve their regret bounds significantly. For linear bandits, we achieve \tilde O(\min\{d\sqrt{K}, d {1.5}\sqrt{\sum_{k 1} K \sigma_k 2}\} d 2) where d is the dimension of the features, K is the time horizon, and \sigma_k 2 is the noise variance at time step k, and \tilde O ignores polylogarithmic dependence, which is a factor of d 3 improvement. For linear mixture MDPs with the assumption of maximum cumulative reward in an episode being in [0,1], we achieve a horizon-free regret bound of \tilde O(d \sqrt{K} d 2) where d is the number of base models and K is the number of episodes.

horizon-free linear mixture mdp, improved regret analysis, variance-adaptive linear bandit, (2 more...)

Neural Information Processing Systems

Industry: Education (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs

Kim, Yeoneung, Yang, Insoon, Jun, Kwang-Sung

arXiv.org Machine LearningNov-5-2021

In online learning problems, exploiting low variance plays an important role in obtaining tight performance guarantees yet is challenging because variances are often not known a priori. Recently, a considerable progress has been made by Zhang et al. (2021) where they obtain a variance-adaptive regret bound for linear bandits without knowledge of the variances and a horizon-free regret bound for linear mixture Markov decision processes (MDPs). In this paper, we present novel analyses that improve their regret bounds significantly. For linear bandits, we achieve $\tilde O(d^{1.5}\sqrt{\sum_{k}^K \sigma_k^2} + d^2)$ where $d$ is the dimension of the features, $K$ is the time horizon, and $\sigma_k^2$ is the noise variance at time step $k$, and $\tilde O$ ignores polylogarithmic dependence, which is a factor of $d^3$ improvement. For linear mixture MDPs, we achieve a horizon-free regret bound of $\tilde O(d^{1.5}\sqrt{K} + d^3)$ where $d$ is the number of base models and $K$ is the number of episodes. This is a factor of $d^3$ improvement in the leading term and $d^6$ in the lower order term. Our analysis critically relies on a novel elliptical potential `count' lemma. This lemma allows a peeling-based regret analysis, which can be of independent interest.

data mining, machine learning, variance-adaptive linear bandit, (11 more...)

arXiv.org Machine Learning

2111.03289

Country:

North America > United States > Arizona (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Industry: Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.34)

Add feedback